5 research outputs found

    Richer object representations for object class detection in challenging real world images

    Get PDF
    Object class detection in real world images has been a synonym for object localization for the longest time. State-of-the-art detection methods, inspired by renowned detection benchmarks, typically target 2D bounding box localization of objects. At the same time, due to the rapid technological and scientific advances, high-level vision applications, aiming at understanding the visual world as a whole, are coming into the focus. The diversity of the visual world challenges these applications in terms of representational complexity, robust inference and training data. As objects play a central role in any vision system, it has been argued that richer object representations, providing higher level of detail than modern detection methods, are a promising direction towards understanding visual scenes. Besides bridging the gap between object class detection and high-level tasks, richer object representations also lead to more natural object descriptions, bringing computer vision closer to human perception. Inspired by these prospects, this thesis explores four different directions towards richer object representations, namely, 3D object representations, fine-grained representations, occlusion representations, as well as understanding convnet representations. Moreover, this thesis illustrates that richer object representations can facilitate high-level applications, providing detailed and natural object descriptions. In addition, the presented representations attain high performance rates, at least on par or often superior to state-of-the-art methods.Detektion von Objektklassen in natürlichen Bildern war lange Zeit gleichbedeutend mit Lokalisierung von Objekten. Von anerkannten Detektions-Benchmarks inspirierte Detektionsmethoden, die auf dem neuesten Stand der Forschung sind, zielen üblicherweise auf die Lokalisierung von Objekten im Bild. Gleichzeitig werden durch den schnellen technologischen und wissenschaftlichen Fortschritt abstraktere Bildverarbeitungsanwendungen, die ein Verständnis der visuellen Welt als Ganzes anstreben, immer interessanter. Die Diversität der visuellen Welt ist eine Herausforderung für diese Anwendungen hinsichtlich der Komplexität der Darstellung, robuster Inferenz und Trainingsdaten. Da Objekte eine zentrale Rolle in jedem Visionssystem spielen, wurde argumentiert, dass reichhaltige Objektrepräsentationen, die höhere Detailgenauigkeit als gegenwärtige Detektionsmethoden bieten, ein vielversprechender Schritt zum Verständnis visueller Szenen sind. Reichhaltige Objektrepräsentationen schlagen eine Brücke zwischen der Detektion von Objektklassen und abstrakteren Aufgabenstellungen, und sie führen auch zu natürlicheren Objektbeschreibungen, wodurch sie die Bildverarbeitung der menschlichen Wahrnehmung weiter annähern. Aufgrund dieser Perspektiven erforscht die vorliegende Arbeit vier verschiedene Herangehensweisen zu reichhaltigeren Objektrepräsentationen

    Richer object representations for object class detection in challenging real world images

    Get PDF
    Object class detection in real world images has been a synonym for object localization for the longest time. State-of-the-art detection methods, inspired by renowned detection benchmarks, typically target 2D bounding box localization of objects. At the same time, due to the rapid technological and scientific advances, high-level vision applications, aiming at understanding the visual world as a whole, are coming into the focus. The diversity of the visual world challenges these applications in terms of representational complexity, robust inference and training data. As objects play a central role in any vision system, it has been argued that richer object representations, providing higher level of detail than modern detection methods, are a promising direction towards understanding visual scenes. Besides bridging the gap between object class detection and high-level tasks, richer object representations also lead to more natural object descriptions, bringing computer vision closer to human perception. Inspired by these prospects, this thesis explores four different directions towards richer object representations, namely, 3D object representations, fine-grained representations, occlusion representations, as well as understanding convnet representations. Moreover, this thesis illustrates that richer object representations can facilitate high-level applications, providing detailed and natural object descriptions. In addition, the presented representations attain high performance rates, at least on par or often superior to state-of-the-art methods.Detektion von Objektklassen in natürlichen Bildern war lange Zeit gleichbedeutend mit Lokalisierung von Objekten. Von anerkannten Detektions-Benchmarks inspirierte Detektionsmethoden, die auf dem neuesten Stand der Forschung sind, zielen üblicherweise auf die Lokalisierung von Objekten im Bild. Gleichzeitig werden durch den schnellen technologischen und wissenschaftlichen Fortschritt abstraktere Bildverarbeitungsanwendungen, die ein Verständnis der visuellen Welt als Ganzes anstreben, immer interessanter. Die Diversität der visuellen Welt ist eine Herausforderung für diese Anwendungen hinsichtlich der Komplexität der Darstellung, robuster Inferenz und Trainingsdaten. Da Objekte eine zentrale Rolle in jedem Visionssystem spielen, wurde argumentiert, dass reichhaltige Objektrepräsentationen, die höhere Detailgenauigkeit als gegenwärtige Detektionsmethoden bieten, ein vielversprechender Schritt zum Verständnis visueller Szenen sind. Reichhaltige Objektrepräsentationen schlagen eine Brücke zwischen der Detektion von Objektklassen und abstrakteren Aufgabenstellungen, und sie führen auch zu natürlicheren Objektbeschreibungen, wodurch sie die Bildverarbeitung der menschlichen Wahrnehmung weiter annähern. Aufgrund dieser Perspektiven erforscht die vorliegende Arbeit vier verschiedene Herangehensweisen zu reichhaltigeren Objektrepräsentationen

    Multi-View Priors for Learning Detectors from Sparse Viewpoint Data

    Full text link
    While the majority of today's object class models provide only 2D bounding boxes, far richer output hypotheses are desirable including viewpoint, fine-grained category, and 3D geometry estimate. However, models trained to provide richer output require larger amounts of training data, preferably well covering the relevant aspects such as viewpoint and fine-grained categories. In this paper, we address this issue from the perspective of transfer learning, and design an object class model that explicitly leverages correlations between visual features. Specifically, our model represents prior distributions over permissible multi-view detectors in a parametric way -- the priors are learned once from training data of a source object class, and can later be used to facilitate the learning of a detector for a target class. As we show in our experiments, this transfer is not only beneficial for detectors based on basic-level category representations, but also enables the robust learning of detectors that represent classes at finer levels of granularity, where training data is typically even scarcer and more unbalanced. As a result, we report largely improved performance in simultaneous 2D object localization and viewpoint estimation on a recent dataset of challenging street scenes.Comment: 13 pages, 7 figures, 4 tables, International Conference on Learning Representations 201

    3D Object Class Detection in the Wild

    Full text link
    Object class detection has been a synonym for 2D bounding box localization for the longest time, fueled by the success of powerful statistical learning techniques, combined with robust image representations. Only recently, there has been a growing interest in revisiting the promise of computer vision from the early days: to precisely delineate the contents of a visual scene, object by object, in 3D. In this paper, we draw from recent advances in object detection and 2D-3D object lifting in order to design an object class detector that is particularly tailored towards 3D object class detection. Our 3D object class detection method consists of several stages gradually enriching the object detection output with object viewpoint, keypoints and 3D shape estimates. Following careful design, in each stage it constantly improves the performance and achieves state-ofthe-art performance in simultaneous 2D bounding box and viewpoint estimation on the challenging Pascal3D+ dataset
    corecore